Picture for Lin Qiu

Lin Qiu

Paul G. Allen School of Computer Science & Engineering, University of Washington, United States

Beyond Instrumental and Substitutive Paradigms: Introducing Machine Culture as an Emergent Phenomenon in Large Language Models

Add code
Jan 23, 2026
Viaarxiv icon

LongCat-Flash-Thinking-2601 Technical Report

Add code
Jan 23, 2026
Viaarxiv icon

CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions

Add code
Oct 30, 2025
Figure 1 for CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions
Figure 2 for CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions
Figure 3 for CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions
Figure 4 for CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions
Viaarxiv icon

Instance-level Randomization: Toward More Stable LLM Evaluations

Add code
Sep 16, 2025
Figure 1 for Instance-level Randomization: Toward More Stable LLM Evaluations
Figure 2 for Instance-level Randomization: Toward More Stable LLM Evaluations
Figure 3 for Instance-level Randomization: Toward More Stable LLM Evaluations
Figure 4 for Instance-level Randomization: Toward More Stable LLM Evaluations
Viaarxiv icon

OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics

Add code
Jun 12, 2025
Figure 1 for OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics
Figure 2 for OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics
Figure 3 for OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics
Figure 4 for OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics
Viaarxiv icon

Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese

Add code
May 16, 2025
Viaarxiv icon

Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024

Add code
Mar 05, 2025
Figure 1 for Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024
Figure 2 for Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024
Figure 3 for Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024
Figure 4 for Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024
Viaarxiv icon

Contextualizing biological perturbation experiments through language

Add code
Feb 28, 2025
Figure 1 for Contextualizing biological perturbation experiments through language
Figure 2 for Contextualizing biological perturbation experiments through language
Figure 3 for Contextualizing biological perturbation experiments through language
Figure 4 for Contextualizing biological perturbation experiments through language
Viaarxiv icon

Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration

Add code
Feb 17, 2025
Viaarxiv icon

Learning Identifiable Factorized Causal Representations of Cellular Responses

Add code
Oct 29, 2024
Figure 1 for Learning Identifiable Factorized Causal Representations of Cellular Responses
Figure 2 for Learning Identifiable Factorized Causal Representations of Cellular Responses
Figure 3 for Learning Identifiable Factorized Causal Representations of Cellular Responses
Figure 4 for Learning Identifiable Factorized Causal Representations of Cellular Responses
Viaarxiv icon